The Distributional Similarity Of Sub-Parses
نویسندگان
چکیده
This work explores computing distributional similarity between sub-parses, i.e., fragments of a parse tree, as an extension to general lexical distributional similarity techniques. In the same way that lexical distributional similarity is used to estimate lexical semantic similarity, we propose using distributional similarity between subparses to estimate the semantic similarity of phrases. Such a technique will allow us to identify paraphrases where the component words are not semantically similar. We demonstrate the potential of the method by applying it to a small number of examples and showing that the paraphrases are more similar than the non-paraphrases.
منابع مشابه
From Global to Local Similarities: A Graph-Based Contextualization Method using Distributional Thesauri
After recasting the computation of a distributional thesaurus in a graph-based framework for term similarity, we introduce a new contextualization method that generates, for each term occurrence in a text, a ranked list of terms that are semantically similar and compatible with the given context. The framework is instantiated by the definition of term and context, which we derive from dependenc...
متن کاملCorpus-based evidence for approximating semantic transparency of complex verbs
lexical concepts. Furthermore, more specific sets of the modifier relation (MO) and its subclasses may model the verbs in a more meaningful way. At last, future studies will need to address the problem of polysemy of the verbs both in corpus-based evidence and human association scores. To summarize, using dependency parses allows us to exploit verbs with separated verb particles. This enlarges ...
متن کاملUCAM-CORE: Incorporating structured distributional similarity into STS
This paper describes methods that were submitted as part of the *SEM shared task on Semantic Textual Similarity. Multiple kernels provide different views of syntactic structure, from both tree and dependency parses. The kernels are then combined with simple lexical features using Gaussian process regression, which is trained on different subsets of training data for each run. We found that the ...
متن کاملA Framework for Compiling High Quality Knowledge Resources From Raw Corpora
The identification of various types of relations is a necessary step to allow computers to understand natural language text. In particular, the clarification of relations between predicates and their arguments is essential because predicate-argument structures convey most of the information in natural languages. To precisely capture these relations, wide-coverage knowledge resources are indispe...
متن کاملFBK: Machine Translation Evaluation and Word Similarity metrics for Semantic Textual Similarity
This paper describes the participation of FBK in the Semantic Textual Similarity (STS) task organized within Semeval 2012. Our approach explores lexical, syntactic and semantic machine translation evaluation metrics combined with distributional and knowledgebased word similarity metrics. Our best model achieves 60.77% correlation with human judgements (Mean score) and ranked 20 out of 88 submit...
متن کامل